221 research outputs found
The Entropy of the K-Satisfiability Problem
The threshold behaviour of the K-Satisfiability problem is studied in the
framework of the statistical mechanics of random diluted systems. We find that
at the transition the entropy is finite and hence that the transition itself is
due to the abrupt appearance of logical contradictions in all solutions and not
to the progressive decreasing of the number of these solutions down to zero. A
physical interpretation is given for the different cases , and .Comment: revtex, 11 pages + 1 figur
Efficiency of quantum versus classical annealing in non-convex learning problems
Quantum annealers aim at solving non-convex optimization problems by
exploiting cooperative tunneling effects to escape local minima. The underlying
idea consists in designing a classical energy function whose ground states are
the sought optimal solutions of the original optimization problem and add a
controllable quantum transverse field to generate tunneling processes. A key
challenge is to identify classes of non-convex optimization problems for which
quantum annealing remains efficient while thermal annealing fails. We show that
this happens for a wide class of problems which are central to machine
learning. Their energy landscapes is dominated by local minima that cause
exponential slow down of classical thermal annealers while simulated quantum
annealing converges efficiently to rare dense regions of optimal solutions.Comment: 31 pages, 10 figure
Learning and generalization theories of large committee--machines
The study of the distribution of volumes associated to the internal
representations of learning examples allows us to derive the critical learning
capacity () of large committee machines,
to verify the stability of the solution in the limit of a large number of
hidden units and to find a Bayesian generalization cross--over at .Comment: 14 pages, revte
On the performance of a cavity method based algorithm for the Prize-Collecting Steiner Tree Problem on graphs
We study the behavior of an algorithm derived from the cavity method for the
Prize-Collecting Steiner Tree (PCST) problem on graphs. The algorithm is based
on the zero temperature limit of the cavity equations and as such is formally
simple (a fixed point equation resolved by iteration) and distributed
(parallelizable). We provide a detailed comparison with state-of-the-art
algorithms on a wide range of existing benchmarks networks and random graphs.
Specifically, we consider an enhanced derivative of the Goemans-Williamson
heuristics and the DHEA solver, a Branch and Cut Linear/Integer Programming
based approach. The comparison shows that the cavity algorithm outperforms the
two algorithms in most large instances both in running time and quality of the
solution. Finally we prove a few optimality properties of the solutions
provided by our algorithm, including optimality under the two post-processing
procedures defined in the Goemans-Williamson derivative and global optimality
in some limit cases
Efficient LDPC Codes over GF(q) for Lossy Data Compression
In this paper we consider the lossy compression of a binary symmetric source.
We present a scheme that provides a low complexity lossy compressor with near
optimal empirical performance. The proposed scheme is based on b-reduced
ultra-sparse LDPC codes over GF(q). Encoding is performed by the Reinforced
Belief Propagation algorithm, a variant of Belief Propagation. The
computational complexity at the encoder is O(.n.q.log q), where is the
average degree of the check nodes. For our code ensemble, decoding can be
performed iteratively following the inverse steps of the leaf removal
algorithm. For a sparse parity-check matrix the number of needed operations is
O(n).Comment: 5 pages, 3 figure
Shaping the learning landscape in neural networks around wide flat minima
Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex
high-dimensional loss function, typically by a stochastic gradient descent
(SGD) strategy. The learning process is observed to be able to find good
minimizers without getting stuck in local critical points, and that such
minimizers are often satisfactory at avoiding overfitting. How these two
features can be kept under control in nonlinear devices composed of millions of
tunable connections is a profound and far reaching open question. In this paper
we study basic non-convex one- and two-layer neural network models which learn
random patterns, and derive a number of basic geometrical and algorithmic
features which suggest some answers. We first show that the error loss function
presents few extremely wide flat minima (WFM) which coexist with narrower
minima and critical points. We then show that the minimizers of the
cross-entropy loss function overlap with the WFM of the error loss. We also
show examples of learning devices for which WFM do not exist. From the
algorithmic perspective we derive entropy driven greedy and message passing
algorithms which focus their search on wide flat regions of minimizers. In the
case of SGD and cross-entropy loss, we show that a slow reduction of the norm
of the weights along the learning process also leads to WFM. We corroborate the
results by a numerical study of the correlations between the volumes of the
minimizers, their Hessian and their generalization performance on real data.Comment: 37 pages (16 main text), 10 figures (7 main text
A three-threshold learning rule approaches the maximal capacity of recurrent neural networks
Understanding the theoretical foundations of how memories are encoded and
retrieved in neural populations is a central challenge in neuroscience. A
popular theoretical scenario for modeling memory function is the attractor
neural network scenario, whose prototype is the Hopfield model. The model has a
poor storage capacity, compared with the capacity achieved with perceptron
learning algorithms. Here, by transforming the perceptron learning rule, we
present an online learning rule for a recurrent neural network that achieves
near-maximal storage capacity without an explicit supervisory error signal,
relying only upon locally accessible information. The fully-connected network
consists of excitatory binary neurons with plastic recurrent connections and
non-plastic inhibitory feedback stabilizing the network dynamics; the memory
patterns are presented online as strong afferent currents, producing a bimodal
distribution for the neuron synaptic inputs. Synapses corresponding to active
inputs are modified as a function of the value of the local fields with respect
to three thresholds. Above the highest threshold, and below the lowest
threshold, no plasticity occurs. In between these two thresholds,
potentiation/depression occurs when the local field is above/below an
intermediate threshold. We simulated and analyzed a network of binary neurons
implementing this rule and measured its storage capacity for different sizes of
the basins of attraction. The storage capacity obtained through numerical
simulations is shown to be close to the value predicted by analytical
calculations. We also measured the dependence of capacity on the strength of
external inputs. Finally, we quantified the statistics of the resulting
synaptic connectivity matrix, and found that both the fraction of zero weight
synapses and the degree of symmetry of the weight matrix increase with the
number of stored patterns.Comment: 24 pages, 10 figures, to be published in PLOS Computational Biolog
Propagation of external regulation and asynchronous dynamics in random Boolean networks
Boolean Networks and their dynamics are of great interest as abstract
modeling schemes in various disciplines, ranging from biology to computer
science. Whereas parallel update schemes have been studied extensively in past
years, the level of understanding of asynchronous updates schemes is still very
poor. In this paper we study the propagation of external information given by
regulatory input variables into a random Boolean network. We compute both
analytically and numerically the time evolution and the asymptotic behavior of
this propagation of external regulation (PER). In particular, this allows us to
identify variables which are completely determined by this external
information. All those variables in the network which are not directly fixed by
PER form a core which contains in particular all non-trivial feedback loops. We
design a message-passing approach allowing to characterize the statistical
properties of these cores in dependence of the Boolean network and the external
condition. At the end we establish a link between PER dynamics and the full
random asynchronous dynamics of a Boolean network.Comment: 19 pages, 14 figures, to appear in Chao
- ā¦